AITopics | spatial description

Collaborating Authors

spatial description

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs

Zhang, Yue, Ma, Tianyi, Wang, Zun, Qiao, Yanyuan, Kordjamshidi, Parisa

arXiv.org Artificial IntelligenceSep-30-2025

Integrating large language models (LLMs) into embodied AI models is becoming increasingly prevalent. However, existing zero-shot LLM-based Vision-and-Language Navigation (VLN) agents either encode images as textual scene descriptions, potentially oversimplifying visual details, or process raw image inputs, which can fail to capture abstract semantics required for high-level reasoning. In this paper, we improve the navigation agent's contextual understanding by incorporating textual descriptions from multiple perspectives that facilitate analogical reasoning across images. By leveraging text-based analogical reasoning, the agent enhances its global scene understanding and spatial reasoning, leading to more accurate action decisions. We evaluate our approach on the R2R dataset, where our experiments demonstrate significant improvements in navigation performance.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.25139

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

Li, Rong, Li, Shijie, Kong, Lingdong, Yang, Xulei, Liang, Junwei

arXiv.org Artificial IntelligenceDec-5-2024

3D Visual Grounding (3DVG) aims to locate objects in 3D scenes based on textual descriptions, which is essential for applications like augmented reality and robotics. Traditional 3DVG approaches rely on annotated 3D datasets and predefined object categories, limiting scalability and adaptability. To overcome these limitations, we introduce SeeGround, a zero-shot 3DVG framework leveraging 2D Vision-Language Models (VLMs) trained on large-scale 2D data. We propose to represent 3D scenes as a hybrid of query-aligned rendered images and spatially enriched text descriptions, bridging the gap between 3D data and 2D-VLMs input formats. We propose two modules: the Perspective Adaptation Module, which dynamically selects viewpoints for query-relevant image rendering, and the Fusion Alignment Module, which integrates 2D images with 3D spatial descriptions to enhance object localization. Extensive experiments on ScanRefer and Nr3D demonstrate that our approach outperforms existing zero-shot methods by large margins. Notably, we exceed weakly supervised methods and rival some fully supervised ones, outperforming previous SOTA by 7.7% on ScanRefer and 7.1% on Nr3D, showcasing its effectiveness.

computer vision, information, spatial information, (15 more...)

arXiv.org Artificial Intelligence

2412.04383

Country:

Asia > Singapore (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Generating Visual Spatial Description via Holistic 3D Scene Understanding

Zhao, Yu, Fei, Hao, Ji, Wei, Wei, Jianguo, Zhang, Meishan, Zhang, Min, Chua, Tat-Seng

arXiv.org Artificial IntelligenceMay-25-2023

Visual spatial description (VSD) aims to generate texts that describe the spatial relations of the given objects within images. Existing VSD work merely models the 2D geometrical vision features, thus inevitably falling prey to the problem of skewed spatial understanding of target objects. In this work, we investigate the incorporation of 3D scene features for VSD. With an external 3D scene extractor, we obtain the 3D objects and scene features for input images, based on which we construct a target object-centered 3D spatial scene graph (Go3D-S2G), such that we model the spatial semantics of target objects within the holistic 3D scenes. Besides, we propose a scene subgraph selecting mechanism, sampling topologically-diverse subgraphs from Go3D-S2G, where the diverse local structure features are navigated to yield spatially-diversified text generation. Experimental results on two VSD datasets demonstrate that our framework outperforms the baselines significantly, especially improving on the cases with complex visual spatial relations. Meanwhile, our method can produce more spatially-diversified generation. Code is available at https://github.com/zhaoyucs/VSD.

machine learning, natural language, object-oriented architecture, (19 more...)

arXiv.org Artificial Intelligence

2305.11768

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Transportation > Ground > Rail (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.35)

Add feedback

Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

Chen, Howard, Suhr, Alane, Misra, Dipendra, Snavely, Noah, Artzi, Yoav

arXiv.org Artificial IntelligenceNov-29-2018

We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent must first follow navigation instructions in a real-life visual urban environment to a goal position, and then identify in the observed image a location described in natural language to find a hidden object. The data contains 9,326 examples of English instructions and spatial descriptions paired with demonstrations. We perform qualitative linguistic analysis, and show that the data displays richer use of spatial reasoning compared to related resources. Empirical analysis shows the data presents an open challenge to existing methods.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

1811.12354

Genre: Research Report (0.50)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

What is not where: the challenge of integrating spatial representations into deep learning architectures

Kelleher, John D., Dobnik, Simon

arXiv.org Artificial IntelligenceJul-21-2018

This paper examines to what degree current deep learning architectures for image caption generation capture spatial language. On the basis of the evaluation of examples of generated captions from the literature we argue that systems capture what objects are in the image data but not where these objects are located: the captions generated by these systems are the output of a language model conditioned on the output of an object detector that cannot capture fine-grained location information. Although language models provide useful knowledge for image captions, we argue that deep learning image captioning architectures should also model geometric relations between objects.

artificial intelligence, machine learning, neuron, (20 more...)

arXiv.org Artificial Intelligence

1807.08133

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)
(7 more...)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Investigating Spatial Language for Robot Fetch Commands

Skubic, Marjorie (University of Missouri) | Alexenko, Tatiana (University of Missouri) | Huo, Zhiyu (University of Missouri) | Carlson, Laura (University of Notre Dame) | Miller, Jared ( University of Notre Dame )

AAAI ConferencesJul-21-2012

This paper outlines a study that investigates spatial language for use in human-robot communication. The scenario studied is a home setting in which the elderly resident has misplaced an object, such as eyeglasses, and the robot will help the resident find the object. We present results from phase I of the study in which we investigate spatial language generated to a human addressee or a robot addressee in a virtual environment and highlight differences between younger and older adults. Drawn from these results, a discussion is included of needed robot capabilities, such as an approach that addresses varying perspectives used and recognition of furniture items for use as spatial references.

furniture item, robot, spatial description, (15 more...)

AAAI Conferences

Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Missouri (0.04)
(6 more...)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Aligned Scene Modeling of a Robot's Vista Space — An Evaluation

Swadzba, Agnes (Bielefeld University) | Wachsmuth, Sven (Bielefeld University)

AAAI ConferencesAug-8-2011

One kind of meaningful structures in indoor rooms are supporting structures like tables and cupboards. A robot will need to know these structures for a natural interaction with the human and the environment. As bottom-up detection of such structures is a challenging problem, we propose to estimate potential supporting structures from a spatial description like ``a bowl on the table''. As language and cognition schematize the space in the same way it is possible to estimate the representation of the space underlying a scene description. To do so, we introduce the aligned modeling approach which consists of rules transforming a sequence of object relations into a set of trees and a methodology to ground the abstract representation of the scene layout in the current perception using detectors for small movable objects and an extraction of planar surfaces. An analysis of 30 descriptions shows the robustness of our approach to a variety of description strategies and object detection errors.

artificial intelligence, relation, spatial reasoning, (17 more...)

AAAI Conferences

Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence

Country: Europe > Germany (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.68)

Add feedback

Human-Driven Spatial Language for Human-Robot Interaction

Skubic, Marjorie (University of Missouri) | Huo, Zhiyu (University of Missouri) | Carlson, Laura (University of Notre Dame) | Li, Xiao Ou (University of Notre Dame) | Miller, Jared (University of Notre Dame)

AAAI ConferencesAug-8-2011

This extended abstract outlines a new study that investigates spatial language for use in human-robot communication. The scenario studied is a home setting in which the elderly resident has misplaced an object, such as eyeglasses, and the robot will help the resident find the object. We present preliminary results from the initial study in which we investigate spatial language generated to a human addressee or a robot addressee in a virtual environment.

addressee, artificial intelligence, robot, (15 more...)

AAAI Conferences

Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Missouri (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology: Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.63)

Add feedback